Updates

This is RMarkdown is to summarize the changes in results that have occurred since we last met. Overall I feel these are positive developments since my confidence in the quality of the data/analysis has been increased, the system-wide color results even seem to further support our argument for autochthonization. I will include some of the prior results followed by the revisions so that you may compare them side by side. This document is subject to changes depending on what we find useful.

Water color at the whole system level

In reviewing my chromaticity code in Google Earth Engine and scrutinizing the results specifically for O.H. Ivie, I noticed that there were some portions of clouds and shadows for which DWL was being calculated. This is due to the commonly applied maskS2clouds function not fully identifying all pixels that have cloud/shadow contamination. In reviewing alternative cloud masking approaches, I came across the “Sentinel-2: Cloud Probability” ee.Image.Collection (“COPERNICUS/S2_CLOUD_PROBABILITY”) which can be combined with the Sen2Cor corrected data (“COPERNICUS/S2_SR_HARMONIZED”) to produce much more accurate identification and exclusion of clouds and associated shadows.

The pixels that had previously been misclassified as water but were actually clouds tended to reflect in the blue spectrum as seen in this mapped out figure. Comparing it with the maps that actually only calculate for cloud-free water pixels we can see clearly where some of those clouds were included:

Here is the updated plot where you can see significant portions of Bonham and Ivie excluded and hardly very little blue for any of the lakes compared to above.

Now looking at the histograms, previously our distribution was only visualized by count:

Besides updating the actual cloud free color distribution, we also updated the histogram to show proportion of total area for each range of wavelengths. This shows clearly what our results paragraph states about how much of the surface area reflects green (550-570, FUI 9-14) compared to above that (yellow-brown) or below that (blue-green to blue), the latter categories are the minority for all lakes.

Looking at NDTI to Turbidity from Sen2Cor vs. ACOLITE

Our approach to Sen2Cor has been pretty non-standardized and might still need some work to address the concerns about poor model fit. I decided to use the same approach I took to the ACOLITE data and query the Sentinel2CloudProb images for points along boat path in R and did not get a much stronger correlation than we were finding with our own piecemeal approach.

s2flame_znpts %>% dplyr::group_by(system) %>% dplyr::summarise(count = n())
## # A tibble: 6 × 2
##   system    count
##   <chr>     <int>
## 1 arrowhead  4705
## 2 bonham     3952
## 3 brownwood  6051
## 4 ivie       6929
## 5 redbluff   5103
## 6 waco       1588
s2flm_allfilter<-s2flame_znpts  %>% filter(SCL == "6")%>% 
  filter(!system=="waco")

ggplot(s2flm_allfilter,aes(turb, ndti)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(aes(label = ..rr.label..)) + theme_bw()
## `geom_smooth()` using formula = 'y ~ x'

When we exclude the 1588 Waco points from the total of 28328 the r2 goes from 0.57 to 0.7 for the sen2cor data downloaded from ee.points app and similar flagging approach to Arrowhead paper.

The following data is for the cloud prob data where band values were extracted in R. Without excluding Waco the r2 is 0.52 compared to 0.56 when those 1201 points were excluded. The total n = 27820.

s2flm_nc %>% dplyr::group_by(system) %>% dplyr::summarise(count = n())
## # A tibble: 6 × 2
##   system    count
##   <chr>     <int>
## 1 arrowhead  4709
## 2 bonham     4473
## 3 brownwood  5744
## 4 ivie       6734
## 5 redbluff   4959
## 6 waco       1201
s2flm_ncfilter<-s2flm_nc  %>%
  filter(!system=="waco")

ggplot(s2flm_ncfilter,aes(turb, ndti)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(aes(label = ..rr.label..)) + theme_bw()
## `geom_smooth()` using formula = 'y ~ x'

Next up you can see that when plotting the ACOLITE L2W data, it automatically excludes 10647 points because the NDTI comes up as NA. The summarise(count = n()) below shows how many of those points are excluded for each lake, this includes all from Waco.

s2flm_l2w %>% dplyr::group_by(system) %>% dplyr::summarise(count = n())
## # A tibble: 6 × 2
##   system    count
##   <chr>     <int>
## 1 arrowhead  4705
## 2 bonham     2774
## 3 brownwood  5993
## 4 ivie       5290
## 5 redbluff   5067
## 6 waco       1545
ggplot(s2flm_l2w,aes(turb, ndti)) + 
  geom_point() + 
  geom_smooth(method = "lm", se=FALSE) +
  stat_regline_equation(aes(label = ..rr.label..)) + theme_bw()
## `geom_smooth()` using formula = 'y ~ x'

s2flm_l2w_nona<-s2flm_l2w %>% drop_na()

s2flm_l2w_nona %>% dplyr::group_by(system) %>% dplyr::summarise(count = n())
## # A tibble: 5 × 2
##   system    count
##   <chr>     <int>
## 1 arrowhead  4404
## 2 bonham     2774
## 3 brownwood  1837
## 4 ivie       5290
## 5 redbluff    422
s2c_lm = lm(turb~ndti, s2flm_allfilter)
summary(s2c_lm)
## 
## Call:
## lm(formula = turb ~ ndti, data = s2flm_allfilter)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -85.723  -4.940  -0.748   4.586  40.910 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  40.7595     0.1251   325.7   <2e-16 ***
## ndti        144.2411     0.5808   248.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.028 on 26637 degrees of freedom
## Multiple R-squared:  0.6984, Adjusted R-squared:  0.6983 
## F-statistic: 6.167e+04 on 1 and 26637 DF,  p-value: < 2.2e-16
plot(s2c_lm, which = c(1,2))